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Abstract. Often the result of a scientific experiment is given by the difference of 
measurements in two configurations, denoted by A and B. Since the measurements 
are not obtained simultaneously, drift of the zero-point can bias the result. In practice 
measurement patterns are used to minimize this bias. The time sequence AB followed 
by BA, for example, would cancel a linear drift in the average difference A — B. We 
propose taking data with an alternating series ABAB.., and removing drift with a 
post-hoc analysis. We present an analysis method that removes bias from the result 
for drift up to polynomial order p. A statistical cost function c{N) is introduced to 
compare the uncertainty in the end result with that from using a raw data average. 
For a data set size > 30 the statistical cost is negligible. For A^ < 30 the cost is 
plotted as a function of A^ and filter order p and the trade off between the size of the 
data set and p is discussed. 

Keywords comparison measurements, cycles of measurements, zero-point drift, filter, 
estimator, ABA method, statistical cost 

1. Motivation and introduction 



The accuracy and precision of a measurement are reduced by drifts in the offset and 
sensitivity of the instrument used to collect the data. The offset or zero-point is 
additive and independent of the magnitude of the value being measured. If uncorrected, 
it directly biases the result of the measurement. Sensitivity on the other hand is 
multiplicative and affects the output of the instrument more for larger quantities 
than for smaller ones. Sensitivity can be measured and any variation corrected by 
repeated or even continuous calibration of the instrument. The offset can be eliminated 
using modulation, i.e. switching the measurement setup such that the quantity being 
measured alternates in sign with measurement states A and B. As long as the zero-point 
does not change over the time it takes to make the measurement, the result determined 
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from the difference of these states is independent of the offset. As an example, a 
technique employed by many laboratories uses a single pan balance for making mass 
measurements. The sample A and a known calibrated mass B are alternately placed 
on the balance's pan and their values read out. The mass difference, obtained by 
subtracting two subsequent readings, and the known mass B give the mass of A. An 
offset that is constant in time does not bias the mass difference. The result can be 
made less dependent on the sensitivity of the balance by choosing a calibration weight 
whose mass is nearly identical to that of sample. Alternately, this can be viewed as 
modulation where the measurement result is half the difference of the measured masses 
and the zero-point is their average. 

If the time required to make an AB comparison is sufficiently long compared to the 
drift rate, the assumption of an unchanging zero-point is no longer valid. In this case the 
result includes a systematic bias from the zero-point drift. This problem is more acute 
for measurements taken over very long times due to the 1/f behavior of noise. The 
term 1/f noise reflects the fact that the power spectral density of the noise is inversely 
proportional to the frequency |T]. Measurements made over long time intervals would 
experience larger fluctuations in the zero-point than those over short time intervals. As 
a consequence, an ensemble of mean values taken at different times has more scatter 
than the errors of the means themselves. The presence of 1// noise has been shown to 
be fully consistent with a stationary distribution [2] and this extra noise can be added 
to the standard deviation of the distribution. 

The technique described in the above example is not limited to mass metrology. 
Many precision experiments change one critical control parameter or a part of the 
geometry to achieve modulation in the experiment's output. Examples include 
measurements of the gravitational constant [3J, the search for parity violation in 
nucleon-nucleon interactions |31 15], the determination of Planck's constant using a Watt 
balance |6] and searches for equivalence-principle violations |7| . As the examples indicate 
square wave modulation is a powerful tool used for Null and non-Null experiments. The 
lock-in amplifier is able to recover noisy signals using similar techniques. 

Consider a series of measurements,Mj, spaced equally in time where an observable rj 
changes sign with each measurement, zo{i) denotes the zero-point that varies slowly with 
i, and Ei is a random variable that adds noise to the measurement. The expectation 
value of e is E{e) = and the noise is assumed to be uncorrelated and stationary, 
Cov{ei,ej) = 6ija^. The time series can be written as 



where i runs from 1 to N, which is assumed to be even. An estimator S of the signal 
can be obtained by multiplying the time series by (—1)* and subsequent averaging. 




(1) 




(2) 



The expectation value and the variance of the estimator are 



(3) 
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Var(S) = E{zof^) - {Ez^if + 



(4) 



The signal rj is recovered, however it is biased by the mean value of the slope of the 
varying background, zqI = ZQ{i + 1) — -2o(0- The variance of the estimator contains 
the variance of the measurement noise plus variations about the mean slope of zq{i). 
This later part limits the ultimate statistical precision of the measurement. 

Various patterns for taking data are known to completely cancel linear and even 
higher order drifts in calculations of the mean [HI [9]. For example, the sequences 
ABA and ABBA both remove linear drift terms. ABBA with 4 measurements 
Ui,M2,M3,M4, recovers the signal by computing the mean according to the expression 
^4 = \{ui — U2 — (ms — 'U4)). Assuming uncorrelated data {Cov{ui,Uj) = Sijcr"^), the 
variance of this estimator is given by Var(S'4) = cr^/4. Averaging of these S4 
results gives an overall variance of a^/N. In the ABA sequence, the estimator is given 
by S3 = \{ui — 2U2 + U3) with Var(S'3) = 3cr^/8. Averaging A^/3 of these S3 results leads 
to a variance of {9/8)a'^/N. In general, the smallest possible variance is achieved when all 
measurements are weighted with equal absolute values. When comparing different drift 
cancellation methods a useful figure of merit is the relative increase in the uncertainty 
of the result Sk over the minimum variance a/\/N. Thus we define the statistical cost 



Comparing the statistical cost values, 0.06 for ABA and for ABBA, shows the latter 
to be better. Strictly speaking, the data set size N should be divisible by both 3 
and 4 when directly comparing these two methods. Although by this metric 6*4 is the 
best unbiased estimator in the presence of linear drift, the ABBA pattern has several 
disadvantages: (1) The series is asymmetric. The first B occurs after a configuration 
change, whereas no such change occurs for the second B measurement. It is reasonable 
to assume the two B measurements have different uncertainties and drift properties. (2) 
The measurement pattern is fixed in advance. If the data exhibits additional non-linear 
drift, removing it would require a new measurement with a more appropriate pattern. 
(3) Drift cancellation only occurs if all measurements of the series are used. If one 
measurement is missing or affected by an out-lier, the remaining measurements in that 
sequence would have to be discarded. 

In this article we propose an alternative approach that avoids the above 
shortcomings. Data is taken in an alternating (square wave) sequence ABAB... without 
regard for any inherent drift. An estimator is then constructed that removes drift 
up to any desired polynomial order p where p < N. When applied to the data it yields 
an expectation value for the mean independent of the drift orders selected. This plays 
the same roll as the data taking sequences discussed above where linear drift is removed. 
The statistical cost associated with increases with increasing p and decreases with 
increasing N. A cost - benefit analysis can help choose appropriate values for p and A^. 

The next two sections contain the mathematical justification for the methods used. 
The construction of the estimator F^ is described in two steps. First, we introduce a 



of Sk as 
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filter that removes polynomial drift from the input data. The output of this filter is 
a shorter time series without drift to the order selected. Next, the filtered data are 
averaged to a single number which is the best estimate of the signal. In practice, the 
estimator is used to generate a set of weights that are applied to the entire data set. 
These products are then summed together for the result. 



2. Design of a filter that removes polynomial drift from data 



The filter is a linear combination of consecutive terms in the original data series 

Ml, U2, .., Mat. a new series yi, 1/2, ...,yN-p-i is calculated as 

p 

yi = Yl CkUi+k (5) 

A:=0 

The CfcS are chosen to cancel variations across the included points that are less than a 
given polynomial order p and preserve a signal that changes sign with each successive 
term. These design requirements for the filter can be met by solving the matrix equation 
/II 1 1 ■■■ l\/Co\ /0\ 








2 
4 



3 
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p2 



J 



C2 
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p-i 










V 1 / 



(6) 



1 ■■■ f-^ 

V 1 -1 1 -1 ... (-l)P 
for Cfc. The first p rows cancel variations with polynomial orders up to p — 1 including 
the zeroth order, i.e. offset. The last row normalizes the part of ui that changes sign 
with each successive i. The p + 1 equations uniquely determine p + 1 coefficients of the 
sum in equation ([5]). 

An alternate analytical expression for these coefficients can be obtained from the 
generating function F(^z) = (1 — z"^^ where z is complex. In the limit \z\ — )■ 1 it can 
be seen that the derivatives of Fi^z^ less than order p vanish. This has a correspondence 
with equation ([H]), where in the first p rows the sum of each successive polynomial order 
is required to vanish. The binomial expansion of Fi^z^ gives 



F{z) 



'1 



-Up 




(7) 



is therefore the Z-transform of a discrete time series /[A;] with the property that 
the sum of terms equal zero for j < p. By taking the Z-transform of both sides 

of equation ([5]) its transfer function Fl{z) can be computed, 

Y{z\ P 



E{z\ 



(8) 



Equating H{z) to F{z) and normalizing with 1/2^ to preserve the signal amplitude, 
the coefficients 

(-1)^ / V \ 
2P \ k I 



(9) 
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are obtained. 

Substituting these coefficients into equation (j5]) one obtains 

i/^(p) = ^E(-ir'(^)«^+^- (10) 

An additional factor of ( — 1)* is included here to demodulate the alternating signal 77 
found in equation ([1]). 

Equation (ITO!) is a discrete convolution of the input data Ui with a finite input 
response (FIR) ffiter kernel. The filter described here has the property of removing 
polynomial drifts. In principle, any (FIR) filter could be used, and another may be 
more efficient in environments where the functional form of the drift is known. 

To properly account for uncertainties it is advantageous to express the convolution 
as a hnear transformation of the input data set U = ui,..,un to a ffitered data set 
Y = yi, ..,yN-p- 

Y = AU (11) 

The matrix A has dimensions {N — p) x N as incompletely defined terms in the 
convolution are discarded. The elements of each row are taken from the coefficients 
in equation (ITO!) : 



aij = — I , , j (—1)"' for < j — i < p, zero otherwise. (12) 




Adjacent values in the ffitered data set are not statistically independent, since filtering 
introduces correlations between the elements of Y. The covariance matrix of the filtered 
data can be calculated from the covariance matrix of the input data Cov(f/, U) using, 

C = Cov(r, Y) = ACoy{U, U)A^. (13) 

We make the assumption that the Ui values stem from an uncorrelated and stationary 
distribution with sample variance (See the 1/f noise discussion in section [1]). The 
covariance matrix Cov{U, U) can then be written as the N x N identity matrix scaled 
by 0"^, however, the calculations presented here do not require a priori knowledge of a. 
The elements of C are given by. 



^2 N-p 



C. 



V 



— airttjr (14) 



4p 



=0 





(15) 



Cjj = — ( . , ) for —p < i — j < p, zero otherwise. (16) 



p 

which simplifies using Vandermonde's identity and a change of variables to 
a^f 2p ' 
4P \p + i-j^ 

Every covariance matrix is positive semi definite and symmetric, i.e. C = [TO] . 
Here, the latter is apparent from the inherent symmetry of the binomial coefficients. 
The covariance matrix C is no longer diagonal and the degree of correlation between 
the ffitered data points is given by the off-diagonal elements. 
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3. Extracting the signal from drift-free data 

The fihered data set, Y, contains the demodulated difference signal rj, which can be 
retrieved using linear regression to estimate its mean value fi. Given the form of the 
covariance matrix, the best linear unbiased estimator (BLUE) is obtained using Aitken's 
generalized least squares procedure [HI [12] • 

where the design matrix X, is a column vector of length N — p with all elements equal 
to one. It should be noted that the matrix expressions in the denominators evaluate to 
scalar values. For notational purposes, we define the estimator of our signal fi to be 
in order to reflect its two parameter nature {N data points and drift reduction up to 
order p). Since the result is independent of the data uncertainty cr^, it can be expressed 
in terms of known quantities, 

X^(AA^)~^ 

= x^iAA^y^x^- ^^^^ 

Replacing Y by AU, a weight vector V can be defined which allows calculation of the 
result directly from the input data set. The estimate for the signal can be obtained in 
one simple step, the scalar product, 

F', = VU wth V = ^^J^^A. (19) 

The weight vector V is unique to the values of and p but can be reused as long as 
the length of the data set and the desired order of drift suppression remain unchanged. 

4. The variance of the estimator 

The variance of is calculated using error propagation. Let the factor multiplying Y 
in equation ( ITSll be 

W = ^ . (20) 

x^iAA^y^x ^ ' 

It then follows that the variance of F^ is given by 

Var(F^) = WCov{Y,Y)W^, 
which simplifies to 

Va.(F») ^ ^.(/;.)_.^ . (21) 

The variance of the estimator is the inverse of the sum of all elements of the inverse 
correlation matrix scaled by cr^. If the value of is not known, one estimate for Var(F^) 
can be obtained by calculating the scatter in the estimator for a number of different 
data segments each of length N. If only a single set of N measurements is available, cr^ 
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can be estimated from the sum of squares of the residuals about the regression mean /i 
as shown. The quantity 

{Y - fiXfC~\Y - fiX) 

follows a x^-distribution which has a mean value equal to the number of degrees of 
freedom u = N ~ p — 1. Hence, 

s^ = {Y- i2Xf{AA^)-\Y - I2X)/{N - p - 1) (22) 

is an estimator of a^. Using equation fl2T]) the variance in the mean of the filtered data 
can be compared to that of the un-filtered input data. The corresponding variance in the 
mean of un-correlated data points is a'^N~^ where the data variability cr^ is common 
to both. Using the ratio of their variances, the statistical cost function becomes 

^"^' = /iwp^-'- 

This gives the relative increase in statistical error resulting from the smaller correlated 
data set (A^ — p) obtained in equation f|TT]) . 

Figure [1] shows the statistical cost for three different filters labeled by their order p 
versus the number of data points A^. Cost values for a fixed value of p oscillate between 
the upper and lower edges of their respective shaded bands with increasing N. The 
upper cost value for a filter of order p is the same as the lower value for a filter with 
p+l. 
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Figure 1. The statistical cost is shown as a percentage versus the number of data 
points. The shaded bands are included to guide the eye and correspond to three 
different order filters. The cost values for a given filter order p oscillate between the 
margins of their respective band with increasing N. This trajectory is shown using 
arrows for the first few points of the p = 3 filter. 
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This oscillation can be understood as follows: Each i/i is a particular linear 
combination of the raw data as given in equation f lTOj) . If they have the same weighting 
factor, we can sum two at a time where their i values differ by m. 

y.{p)+y.^Up) = ^E(-l)^^' {I) iu^+k + (24) 

If m is odd this is equivalent to applying the filter to the difference of the two points. 
The filter removes variations across the points that go as i'^ where q is some power less 
than p. The difference (i + my — i'^ is always one order less than q. In the process of 
taking the mean, for those points whose index differs by an odd number, the data is 
effectively filtered to the next higher order than the one chosen. In computing the mean, 
the Hi values are weighted by the vector W given in equation ( |20|) . Elements of this 
weight vector are each a sum over the corresponding column of the inverse correlation 
matrix (AA^)~^. Since the correlation matrix is symmetric about its diagonal it follows 
that the inverse correlation matrix is symmetric as welllil The weight vector therefore 
has this same symmetry about its center. If the length of the weight vector is even, 
then the positions of its elements that are equal in value differ by an odd number and 
the polynomial order filtered increases by 1. The oscillating behavior in the variance 
reflects this additional cancellation. When choosing either the filter order or the length 
of the data set, the statistical error is smallest when N — p is odd. 

The statistical precision of the mean of a measurement series scales as a/y/N 
where N is the number of measurements. If all measurements come from a population 
with variance known a priori to be small enough, the desired statistical precision can 
be achieved with just a few measurements. In this case drift suppression is best 
achieved with sequence patterns as discussed in section [H In all other cases precision 
measurements require N sufficiently large that the increase in error from the filter would 
be minimal. In measurement scenarios with two different outcomes, equal amounts 
of data are taken in each state to insure an unbiased result. The lowest order filter 
consistent with even values of has p = 3. It can be seen from figure [1] that after 8 
measurements the increase in statistical error is below a few percent. If only linear drift 
were removed, this same statistical cost could be achieved with even fewer measurement. 
If 20 or more measurements are available the increase in error is minimal for all filter 
orders shown. 

This analysis method can also be used to characterize the zero-point drift of the 
instrument. We start with the null hypothesis, i.e. that no significant drift exists so the 
results would be the same whether or not the filter is used. Without loss in generality 
we assume an even N. Hence, we choose to compare this with a filter of order p = 3 to 
achieve the smallest error. An estimator for un-filtered data can be obtained by letting 
p — )■ 0. In this case elements of the A matrix {N x N ) are given by aij = {—lySij and 
the design matrix X has length N. Since both the filtered and un-filtered estimators 
are calculated from the same raw data, the null hypothesis can be checked. Using 

X This follows from a property of invertible matrices (C^^)^ — (C-^)^^ 
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equation fl22p an estimator of the standard deviation for both cases can be obtained, 
■Sp=o ^p=3- Muhiplying these estimators by their degrees of freedom u = N — p — 1, 
yields variables that follow x^-distributions with — 1 and — 4 degrees of freedom 
respectively. In each case the probability is calculated and the null hypothesis is 
accepted if the probability of the filtered data is not significantly larger than that of 
the un-filtered data. 

5. Conclusion 

We have shown how zero-point drift can introduce a bias into the outcome of a 
measurement. Difference measurements in particular are sensitive to drifts, especially 
when the time required for a measurement is long. If there are two states A and B, 
choosing specific measurement sequences like ABBA can completely eliminate this bias 
from the difference. 

The alternative method proposed here acquires data with a square wave modulation 
sequence ABAB... and recovers the signal with a post-hoc analysis. It is equally 
effective in removing linear drift and, in addition, can be configured to suppress zero- 
point drift of any desired polynomial order. Besides the simplicity of the measurement 
sequence, it offers the following advantages: (1) The measurement series is symmetric 
as every measurement is performed after the state has changed, (2) the highest order 
drift suppressed can be selected after the data has been successfully taken, and (3) a 
missing data point has only a minor effect on drift suppression. Its main disadvantage 
is that it results in a slightly higher uncertainty in the end result when compared with 
schemes like ABBA where data points are weighted with equal absolute values. The 
relative amount by which the uncertainty is increased depends on the number of data 
points and the order of the polynomial drift suppressed (< p) but is typically less 
than a few percent as shown in figure [H 

In the analysis process the set of weights (weight vector) used to average the 
ABAB... data series is calculated according to the prescription given for the estimator 
F^. Applying the method is straight forward due to the use of matrix equations. 
Typically the filter would be chosen to have an order p of 2 or 3 depending on whether 
the data set size A^ is odd or even, respectively. With p and A^ known one need only 
construct the appropriate A matrix and design matrix X. There are software packages 
available such as MatLab, Mathematica, Wavemetric's IGOR, and Python with SciPi 
and NumPi to perform the matrix operations. The estimator of the result is then the 
scalar product of the data vector with the weight vector. 

We have chosen a polynomial expansion for the drift because the lowest order 
term contributes directly as a bias offset of the measurement. There may be specific 
environments where a different expansion would lead to smaller errors with fewer terms. 
The matrix equations have been kept sufficiently general as to allow incorporating other 
functional forms of the drift. We note that the formalism presented here can be adopted 
to accommodate irregularly-spaced data. In this case each row of matrix. A, has to be 
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changed to meet the irregularly-spaced analog of equation (jS]). 

In conclusion, we believe that the filter presented here in combination with 
an ABAB... data sequence is an elegant and powerful tool for obtaining accurate 
measurements in the presence of drift. 
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Appendix A. An example calculation 

In this example we choose a filter of order p = 2 as a closed form expression for the 
weight vector V for odd can be written as 

V = (^—,^^,^—,^^,...^—] for AT > 3. (A.l) 

The estimator of the signal is given by the dot product of V with the data vector U. 
The reader is encouraged to calculate the mean with various data sets of his or her 
choosing. For example: 

= (1, 1,1,1,... 1) constant offset = VU = 

C/'^ = (1,2, 3, 4,... AT) linear drift = VU = 

= (1, 4, 9, 16, . . . A^^^) quadratic drift F^ = VU ^ 

U^ = [d, -d, d,-d...d) signal F^ = VU = d 
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The variance of the estimator is given by the length of the weight vector squared times 
the variance of the original data, Var(F^) = VV^a"^. In this example, p = 2 and odd 
A^, the variance calculates to Var(F^) = a'^N/{N'^ — 1), which should be compared 
with (J^/iV, the smallest possible variance for an estimator of the signal. The reader is 
encouraged to add Gaussian noise to the data examples above to numerically verify the 
variance of the estimator. 

An interval of 1// noise can be expanded in a polynomial series and this result 
shows terms above the filter cutoff bias the mean value. An ensemble of mean values 
taken at different times will tend to scatter more than the errors of the means themselves. 



